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Abstract 

Detection of differential item functioning (DIF) is most often done between 
two groups of examinees under item response theory. It is sometimes 
important, however, to determine whether DIF is present in more than two 
groups. In this paper wfs present a method for detection of DIF in multiple 
groups. The method is closely related to Lord's chi-square for comparing 
vectors of item parameters estimated in two groups. An example using real 
data is provided. 
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Introduction 



An item is said to be dii^cxcntially functioning if the probability of a correct 
response is different for examinees at the same ability level but from different 
groups (cf> Pine, 1977). Efforts to detect differential item functioning (DIF) 
have been extensively reviewed by Berk (1982) and Holland and Wainer 
(1993) for methods based on both classical test theory and item response 
theory (IRT). DIF detection methods under either theoretical approach, 
however, have been most completely developed for the two group case 
in which comparisons are made between some aspect of the responses of 
examinees in a base (or reference) group and examinees in a second (or 
focal) groups. It is not uncommon, however, to have a situation in which 
more than two groups exist. With most current DIP detection methods, 
multiple two-group comparisons are required to detect LiF across all groups. 
This approach does not permit other than pairwise comparisons among 
groups. A more appropriate and useful approach would be to search for 
DIF simultaneously across all groups. In this paper, wc present a method 
under IRT for simultaneous detection of DIF in a multiple group situation. 

Detection of DIF under IRT is based on the assumption that the items 
on the test measure the same underlying ability in all groups from the same 
population. Two main approaches have been used for detection of DIF under 
IRT. One approach focuses on a comparison of item parameters estimated in 
two groups (Draba, 1977; Lord, 1977, 1980; Thissen, Steinberg, & Wainer, 
1988, 1993; Wright & Stone, 1979). The other approach focuses on the area 
between item response functions (IRFs) from two groups (Kim & Cohen, 
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1991; Linn, Levine, Hastings, & Wardrop, 1981; Raju, 1988, 1990; Rudner, 
1977; Wainer, 1993). Our focus in this paper is on the first set of methods 
which compare item parameters estimated in different groups. Specifically, 
we describe a chi-square method for comparison of item parameters estimated 
in multiple groups. 

IRT Model 

The probability of a correct response for a dichotomously scored item can be 
expressed by the three-pareimeter IRF (Birnbaum, 1968) as 

PiW = 7i + (1 - 7i)[l + «cp{-a,(<» - 13^)}]-\ (1) 

where aj, and are the item discrimination, difficulty, and pseudo- 
guessing parameters, respectively, for item ji, and 9 is the ability parameter. 
Equation 1 expresses the probability of a correct answer on the 9 scale. We 
will make use below of the fact that this probability is also the true score 
function for item j. 

Estimation of the pseudo-guessing parameter is well-known to be prob- 
lematic unless there Is a large number of examinees for whom the item is 
reasonably difficitlt (ci Baker, 1987, 1988; Kolen, 1981; Shepard, Camilli, & 
Averill, 1981; Thissen k Wainer, 1982). In this regard, Lord (liJSO) presented 
a DIP detection procedure which does not consider the pseudo-guessing pa- 
rameter. The extension of Lord's chi-square procedure described in this paper 
likewise is discussed with the two-parameter logistic IRF defined as 

P,(<') = [l + exp{-a,(fl-^,)}]-^ (2) 
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The method, however, can be adapted easily to include comparison of the 
pseudo'guessing parameter and is sufficiently general to accommodate any 
IRT model for dichotomously scored items. Further, the method as described 
here can be applied to the Rasch logistic model (Ra8ch,1980) or extended to 
include Samejima's (1969) graded response model. 

Definition of DIF 

When item parameters are estimated from two different groups of examinees, 
we obtain two sets of item parameter estimates, ^ aji bji ^ from the 
first group and ^ aj2 bj2 ) from the second. IRT assumes that the item 
parameters are invariant across groups if examinees are drawn from the same 
population (cf. Baker, 1985, 1992; Hambleton, 1989; Lord, 1980). Therefore, 
the two sets o? item parameter estimates should be identical within sampling 
fluctuation after proper scaling adjustment. When the parameter estimates 
in the first group are not the same as the estimates in the second, the item 
is considered to be functioning differentially in the two groups. Since the 
shapes of the IRFs are dictated by their parameters, when the parameters 
differ so will the IRFs. 

The definition of DIF stated in terms of IRFs, however, is unnecessarily 
restrictive as it is applicable only for dichotomous scored items. A more 
useful definition of DIF would be the following: 

An item is considered to be functioning differentially when the 
item true score functions in the different groups are not equal. 

As noted earlier, for the dichotomously scored items, the item true 
score function is identical to the IRF. The statement in terms of item true 
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score functions provides a consistent definition of DIF to include not only 
dichotomous models but also polytomous IRT models in general (Cohen, 
Kim, & Baker, 1992). Further, it should be noted that the item true score 
functions are identical if and only if the sets of the item pevrameters from the 
groups are e.qual. Consequently, the null hypothesis for testing the equality 
of the two-parameter IRFs from K groups of examinees can be stated as 

The null hypothesis can also be stated as 

Ho:i,, = -- = i,fc = - = i,x. (4) 
where = ^ ajk jSjk ) . The alternative hypothesis is, of course, 

Hi : Ho is not true. (5) 

Lord's Chi-Square 

For two groups of examinees. Lord (1980) presented a chi-aquare method 
for comparing vectors of item parameters. Lord's chl-square is obtained as 
follows: Suppose we define Vjk and E,jjk as the vector of maximum likelihood 
item pfiurameter estimators and the asymptotic variance and covarlance 
matrix for Vj^, respectively, for the fcth group of examinees. That Is, 

Vjfc = ( ajk bjk y (6) 

and 
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Then, for large samples 

Vi*-i,.fc-mSifc) (8) 

or equivalently 

(vi*-i,J'S;i.Hvi*-i,,)~xW (9) 

The statistic in Equation 9 is sometimes called the Wald statistic and can be 
used to make inferences about ^^.^ (Rubin, 1988; Wald, 1943), Hence, item 
parameters estimated in two groups of examinees, which have been placed on 
the same scale, can be compared using the following chi-square test statistic 
described by Lord (1980): 

= (Vii - v,0'(S^i + Sp)-^(v,i ^ v,0. (10) 

The null hypothesis tested is Ho : = Lord's chi-square has two 

degrees of freedom for the two-parameter model This chi-square has been 
shown to be effective for detection of DIF (Candell & Hulin, 1987; McCauley 
& Mendoza, 1985) and is based on the following assumption (Lord, 1980): 

1. It is asymptotic. 

2. 6 are assumed to be known. 

3. It is appropriate only for maximum likelihood estimates. 

A DIF Statistic in Multiple Groups 

Suppose we have a set of item parameter estimates for the two-parameter 
model for item j from K different groups of examinees. We assume that 
all item parameter estimates are placed on the same metric so that the 
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comparisons can be made. We define Vj as the vector of length 2K of 
estimators of all item parameters from the K different groups, as the 
vector of item parameters, eoid as the block-diagonal and non-singular 
dispersion matrix of Vj-. The first step is to formulate the following model to 
describe Vj as 

Vi = Xi.+£,, (11) 

where 

Vj = ( a^i 6ji ..^ ajK bjK (12) 
X is a known design matrix such as X = I2K1 

i, = («ii •■• ^jk)', (13) 

and Cj is the error vector with the dispersion matrix D(cj) = Ej as 

^ var(aji) cov(oji,bji) ••• 0 0 

cov(aji,feji) var(6ji) ••• 0 0 



0 0 ••• var(a,jf) cov{ajK,bjK) 

0 0 ••• cov{ajKtbjK) var(6jK) j 

(14) 

It can be seen that, asymptotically^ 

provided that -B(vj) = (Dobson, 1992). We can re-express Equation 15 
in terms of a new parameter vector C^^. of length p such that the new 
variance and covariance matrix CJ^jC is non-singular. Here, C is a contrast 
matrix which contains p rows of contrast vectors that are linearly independent 



(Johnson & Wichem, 1992). Then the quadratic term Qj is defined as 

Qi = (Cv,- - Ci,)'(C£,C')-»(Cv,- - Cy . (16) 
Any test for the homogeneity of item parameters can be expressed as 



Ho : C^_. = 0. 



(17) 



The asymptotic distribution of the quadratic term Qj^ which is a multi-group 
DIF statistic, under the null hypothesis, C^^. = is given by 

= (Cv,)'(CS,.C')->(Cv,) ~ xl, (18) 

where p is the rank of C (Dobson, 1990). 

For example, when we have two groups of examinees, we obtain two sets 
of item parameter estimates. We assume that a proper scaling adjustment 
has been done so that two item parameter estimates from the first and second 
groups are expressed on the same scale. Then, 



V; = ( a^i bji aj2 bj2 ) 



(19) 



and 



f var(aji) cov(a,i, 6ji) 

cov(ayi,6ji) var(&ji) 

0 0 

0 0 



0 
0 



var(oyj) cov(oj2, bjj) 
cov(aj3,ij3) var(ijj) ) 



(20) 



The hypothesis of the equality of two sets of item parameters can be tested 
with the matrix of contrast coefficients defined as 



(10-1 0 \ 
^- Vo 1 0 -1 ) 



(21) 
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which hafi rank of two. This yields 



and 




With the null hypothesis C^^. = ( 0 0 ^ , the test statistic can be written 
as 

Qi = (Vii - Vi,)'(II,i + £ia)-'(vii - Via) (24) 

which is in fact the same as Lord's chi-square in Equation 10 with two degrees 
of freedom. 

Example 

We provide an example to illustrate the detection of DIF in thr^^e groups 
of examinees using the method described above. The data for this example 
were taken from a study by Cohen and Kim (1992) of calculator effects on 
mathematics test items. 

Data and Item Parameter Estimation 

Three groups of 200 students each were selected from students enrolled 
in calculus and pre-calculus mathematics courses in Fall 1990 at a large 
midwestem university. The first group was composed of examinees who 
were not allowed to use calculators during the test (i.e., No-Calculator 
Group). The second and third groups were composed of examinees who used 
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two different brands of scientific calculators when they took the test (i.e., 
Calculator-1 and Calculator-2 Groups). All students were tested during the 
first week of classes, prior to any instruction in the course. 

The test consisted of 14 items assembled from the pre-calculus section of 
a standardized multiple-choice university mathematics placement test with 
five options per item. The items used were all operational items on the test 
and had originally been written for use without calculators. DIF analysis 
was used to determine which items were sensitive to calculator usage. It was 
also of interest to determine the effect of two different brands of calculators 
on item performance. 

As the number of examinees in each group was relatively small, we opted 
to try to use the two-parameter logistic model to fit the data sets. IRT item 
parameter estimates were obtained using BILOG 3 (Mislevy & Bock, 1990) 
with the marginal maximum likelihood estimation. The item fit statistics 
provided by BILOG 3 indicated that the two-parameter model provided a 
good fit to the data. Use of an IRT model also assumes that the data are 
unidimensional. Reckase (1979) has suggested that this condition may be 
satisfied if the fi^rst component in a principal component analysis accounts 
for at least 20 percent of the variance. A principal component analysis 
using tetrachoric correlation coefficients indicated that the data sets were 
sufficiently unidimensional for purposes of this study. 

Summary statistics for each group of examinees are given in Table 1. 
Examination of mean item difficulties for the test indicated that the test was 
at about the appropriate level of difficulty for the examinees in the sample. 
Classical test item difficulties and item-excluded biserial correlations are 
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presented in Table 2. IRT item discrimination and item difficulty estimates 
along with estimated variances and covariances are given in Table 3. 



Insert Tables 1, 2, and 3 about here 



Iterative Linking 

Under the assumption of item parameter invariance, item parameters 
estimated in different groups will differ from one group to another due only to 
errors in measurement. The metrics from these groups must first be equated 
to a common scale before between-groups DIF comparisons of parameter 
estimates are made. In the multi-group DIF study context, estimates from 
the calibration of the two focal (i.e., Calculator-1 and Calculator-2) groups 
must be transformed to the metric of the reference (i.e., No- Calculator) 
group. In this study, therefore, two sets of linear coefficients are required for 
transforming the estimates from each of the two focal groups to the reference 
group scale. For simplicity, we designate the No-Calculator group as the 
first group and the Calculator-1 and Calculator-2 groups as the second and 
third groups, respectively. To illustrate the transformation, the transformed 
estimates of item discrimination and item difficulty parameters from the 
second group to the metric of the first group for item j are given by 



(25) 



and 




(26) 
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where indicates a transformed value and A and B are the linear transforma- 
tion coefficients for linking. The task of linking two metrics is to determine 
appropriate coefficients A and B. Note that a different set of A and B coef- 
ficients needed to be determined to transform the third group estimates onto 
the scale of the first group. The test characteristic curve method by Stocking 
and Lord (1983) was used in this study as implemented in the computer pro- 
gram EQUATE (Baker, Al-Kami, & Al-Dosary, 1991) for determining the 
appropriate A and B coefficients. 

The linking procedure may be seriously affected by the presence of DIF 
items (Lautenschlager h Park, 1988; Shepard, Camilli, iz Williams, 1984), 
Therefore, an iterative linking procedure described by Candell and Drasgow 
(1988) was used in the DIF analyses in this paper. In this procedure, an 
initial set of linear coefScients is determined and used to transform the 
parameter estimates from the focal group to the reference group metric. DIF 
analyses are done and items identified as DIF items are removed and the 
linear coefficients re- calculated from the remaining items. Item parameter 
estimates from the focal group are again traneformed onto the reference 
group metric and DIF analyses again conducted. This process continues 
until either no DIF items are detected or until the same set of DIF items is 
detected. Evidence suggests using the test characteristic curve method with 
iterative linking provided more accurate detection of DIF than either the 
weighted mean and sigma method or minimum chi-square method (Kim & 
Cohen, 1992). 
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DIF Measures 

The null hypothesis used in the present study of the equality of three sets of 
item parameters was tested with the following contrast matrix: 



C = 



/ 1 0 -1 0 
0 10-1 

10 0 0 

Vo 1 0 0 



0 0 \ 
0 0 
-1 0 

0 -1/ 



(27) 



This contrast matrix is of rank four and yields the following comparisons: 



Cv,= 



Oil - «i3 



\ 



(28) 



The null hypothesis for these comparisons is 













0 




-«i3 


0 






V 0 ) 



(29) 



The test statistics, Q^-, with four degrees of freedom can be obtained using 
Equation 18. In the present study Qj was tested with a .05 type I error rate. 

It is of course possible to specify a different contrast matrix than that 
givsn in Equation 27. For example, we could use 

0 



C = 



/ 1 0 -1 

0 1 0 

0 0 1 

V 0 0 0 



0 0 \ 
-10 0 
0-10 

1 0 -1 y 



(30) 



This contrast matrix looks different than that in Equation 27 but would 
produce the same value of Qj. 
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It is important to note that the calculation of Qj requires that both 
item parameter eatimates and the variance and covariance matrices from the 
second and third groups be placed onto the metric of the first group. As an 
example, the following transformations of the estimated variance terms from 
the second group are required in the calculation of Qj: 

varCaJa) = var(a,a)/A^ (31) 

and 

var(6;2) ^ A^var(6^-2). (32) 

Note that no transformation is needed for the individual covariance terms. 

For compaxison purposes, three Lord's chi-squares were obtained from the 
pairs of the groups. Iterative linking was also used with Lord's chi-square for 
each pair of the groups. Since a .05 type I error rate was used for Qj, each 
Lord's chi-square was tested with 1 - .95^/3 = .017 type I error rate (Kirk, 
1982). 

As a comparison between the multi-group DIF statistic and Lord's 
chi-squares, three pairwise multi-group DIF statistics were also obtained. 
Estimates for this set of comparisons were based on the equating coefficients 
from the final iteration of the multi-group DIF Qj procedure. These pairwise 
comparisons were also tested with a .017 type I error rate. For example, the 
pairwise comparison between the first and second group was obtained the 
contrast matrix defined as 

/ 1 0 -1 0 0 0 \ 
^ - 0 1 0 -1 0 0 )• ^^^^ 
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This pairwise Qj may be the scune as Lord's chi-square for the two groups 
compared. Differences which occur between the pairwise Qj and Lord's chi- 
square do so because of the differences in linking coeiScients obtained for the 
two approaches. 

Results 

Results for the multi-group Qj, Lord's chi-square, and the pairwise Qj are 
given in Table 4. Possible calculator effects were detected in two items 
(Item 10 and Item 14) using multi-group DIF statistic Qj and in one item 
(Item 10) using Lord chi-square. The pairwise Qj resulted in the same two 
DIF items (sec Table 4) as the multi-group Qj, Both Lord*8 chi-square and 
pairwise Qj detected no significant differences on item performance between 
the two brands of calculators. 

Insert Table 4 about here 

The DIF detection procedure based on the multi-group Qj required three 
linking iterations. (Recall that the pairwise Qj were obtained from the 
transformed estimates in the final iteration.) Linkings based on Lord chi- 
square required one or two iterations. Table 5 contains linking coefficients 
and DIF items detected. Differences in the A and B coefficients after the 
first iteration are a result of the differences in the detection of DIF b - ween 
the multi-group DIF statistic and Lord's chi-square. 

Insert Table 5 about here 



15 



The two items detected with Qj were computational items and were easier 
when calculators were used. Item 10 was identified as a DIF item by the 
multi-group and pairwise DIF statistics. This item required examinees to 
find an unknown angle given the sine of a second unknown angle minus a 
known angle. This problem can be solved easily using a scientific calculator 
by entering each of the five choices and pressing the sine function. Examinees 
with calculators seemed to have an advantage on this item. 

Item 14j a trigonometry item, was identified by both the multi-group and 
pairwise DIF statistics and Lord's chi-square. It required the examinee to 
find cos(— x) given the value of cos(x). Examinees with calculators had an 
advantage on this item. Using a calculator, an examinee could possibly have 
inserted a value for x and pressed function keys until an answer was found 
which agreed with one of the choices for the item. 

Discussion 

The presence of DIF in a test is a serious problem affecting the validity of 
the item as well as of the entire test. The typical DIF study is conducted 
between two groups. It is important to note that situations arise in which 
comparisons among several groups may be desirable or necessary. In such 
cases, one approach might be to conduct multiple pairwise comparisons. It 
may be preferable, however, to conduct simultaneous comparisons among the 
groups. In this paper, we presented a statistic, Qj for simultaneous detection 
of DIF in multiple groups. This statistic is closely related to Lord's chi-square 
and is based on the same set of assumptions. 
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One of the assumptions of Lord's chi-square is that the 9 are known. In 
this regard, Lord and Wingersky (1985) presented sampling variances and 
covariances of parameter estimates in IRT when abilities are unknown under 
the joint maximum likelihood estimation context. When 9 are unknown, 
McLaughlin and Drasgow (1987) have shown that the type I error rate of 
Lord's chi-square may be seriously violated for joint maximum likelihood 
estimates. The type I error rate of Lord's chi-square does not appear to be 
inflated, however, when marginal maximum Hkelihood estimates or marginal 
Bayesian estimates were used (Cohen k, Kim, in press). A well-known result 
of marginahzed solutions (cf. Drasgow, 1989; Mislevy & Stocking, 1990) is 
that improved estimates of item parameters are typically obtained over those 
from joint maximum likelihood estimation. This improvement was shown in 
spite of the fact that ability was also treated as unknown. Further research is 
needed on the null distributions of Qj, particularly for short tests and small 
samples* 

After obtaining a significant multi-group DIP statistic, it may be of 
interest to compare pairs of groups. The pairwise Qj could be used for 
this purpose. As discussed earher, the pairwise Qj is identical to Lord's 
chi-squcire but, as found in the present study, may be based on different 
equating coefficients. The results of the example give some indication of 
differences in the two procedures for the detection of DIP. It also should 
be noted that comparisons using the multi-group DIP statistics are not 
limited to pairwise cases. Comparison between the No-Calculator group 
and a combined calculator group (the Calculator-1 and Calculator-2 groups) 
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could have been done using a contrast matrix such as the following: 

/I 0 -.5 0 -.5 0 ^ . . 

^-[O I 0 -.5 0 -.5 J • ^^^^ 

It should be noted that the type I error rate jnust be adjusted for the total 
number of contrasts (cf. Kirk, 1982). 

The example given in this paper was used to illustrate a situation in which 
a comparison of item parameters was desirable among more than two groups. 
Using the multiple group Qj statistic described in this paper, it was possible 
to simultaneously compare item parameters in each of the three groups in 
the example. Lord's chi-squares and the pairwise Qj were also presented 
and used to illustrate the differences between the multi-group approach 
and the Lord chi-square approach. Differences between two types of DIP 
detection methods occurred because of the differences in linking coefficients. 
Even though the same general iterative linking procedure was used for two 
approaches, a different rationale for the statistics yielded different equating 
coefficients and, consequently, different sets of DIP items. 

In the example provided in this paper, equal sample sizes of 200 for the 
reference and the two focal groups were used to control the effect of sample 
size. It is likely that in most DIP studies equal sample sizes do not occur. 
Further, the ability distributions for the reference and the two focal groups 
were well matched to the distribution of item difficulties. The effects of 
inequalities on these factors were not addressed in this paper. 

An alternative approach to testing DIP in multiple groups might be 
one suggested by Lord (1980) which employs a MANOVA approach with 
post hoc comparisons based on Roy's method (Kim & Cohen, 1993). One 

18 

2 J 



drawback to this procedure, however, is that the assumptions of this method 
are somewhat more difficulty to realize than those for the mult i- group DIF 
statistic presented in this study. 
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TABLE 1 

Raw Scott Summary Statistics for the Data Sets 



Group 



Statistic 


No- Calculator 


CalcuIator-1 


Calculator-2 


Number of Items 


14 


14 


14 


Mean Score 


9.24 


9.71 


9.56 


Standard Deviation 


3.02 


2.95 


2.83 


CoefEcient Alpha 


.66 


.62 


.60 


Number of Examinees 


200 


200 


200 
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TABLE 2 

Classical Item Difficultiea and Item-Excluded Biserial Correlations for the Data Sets 





No Calculator Group 


Calculator- 1 Group 


Calculator-2 Group 


Item 


Difficulty 


Correlation 


Difficulty 


Correlation 


Difficulty 


Correlation 


1 


.86 


.62 


.88 


.59 


.86 


.52 


2 


.88 


.39 


.92 


.29 


.92 


.49 


3 


.75 


.33 


.70 


.32 


.70 


.26 


4 


.76 


.31 


.74 


.33 


.73 


.25 


5 


.72 


.55 


.74 


.41 


.68 


.48 


6 


.76 


.49 


.77 


.37 


.77 


.45 


7 


.52 


.53 


.49 


.42 


.53 


.39 


8 


.69 


.42 


.79 


.39 


.72 


.30 


9 


.67 


.51 


.67 


.66 


.72 


.28 


10 


.55 


.44 


.70 


.61 


.66 


.64 


11 


.71 


.51 


.70 


.32 


.69 


.40 


12 


.57 


.53 


.59 


.73 


.54 


.45 


13 


.45 


.43 


.49 


.57 


.47 


.39 


14 


.37 


.46 


.54 


.47 


.60 


.52 
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TABLE 3 

Item Parameter Estimates and Their Estimated Variances and Covariances for the Data Sets 



No-Calculator Group Calculator- 1 Group CaIculator-2 Group 



Item 


0 (var) 


b (var) 


(cov) 


o (var) 


6 (var) 


(cov) 


a( 


var) 


6 (var) 


(cov) 


1 


1.53 (.17) 


-1.73 (.10) 


(.10) 


1.39 (.15) 


-2.01 (.15) 


(.13) 


1.18 


(.11) 


-2.06 (.18) 


(.12) 


2 


.84 (.09) 


-2.77 (.68) 


(.24) 


.66 (.10) 


-4.06(2.86) 


(.51) 


1.22 


(.20) 


-2.58 (.40) 


(.25) 


3 


.63 (.04) 


-1.95 (.37) . 


(.11) 


.55 (.04) 


-1.69 (.36) 


(.10) 


.45 


(.04) 


-2.02 (.80) 


(.16) 


4 


.56 (.05) 


-2.19 (.71) 


(.18) 
(.05) 


.61 (.05) 


-1.91 (.42) 


(.12) 


.45 


(.03) 


-2.37 (.96) 


(.17) 


5 


1.16 (.08) 


-1.10 (.06) 


.80 (.05) 


-1.54 (.18) 


(.08) 


.90 


(.05) 


-1.01 (.08) 


(.04) 


6 


.98 (.06) 


-1.44 (.11) 


(.06) 


.77 (.05) 


-1.79 (.24) 


(.09) 


.91 


[•07) 


-1.55 (.14) 


(.08) 


7 


1.12 (.06) 


-.10 (.03) 


(.00) 


.86 (.05) 


.09 (.04) 


(.00) 


.68 


(.04) 


-.20 (.07) 


(.01) 


8 


.79 (.05) 


-1.16 (.12) 


(.05) 


.78 (.07) 


-1.92 (.33) 


(.13) 


.58 


(.04) 


-1.79 (.38) 


(.11) 


9 


1.02 (.06) 


-.86 (.06) 


(.03) 


1.52 (.10) 


-.71 (.03) 


(.02) 


.50 


(.03) 


-1.98 (.58) 


(.12) 


10 


.91 (.05) 


-.28 (.05) 


(.01) 


1.39 (.09) 


-.87 (.04) 


(.03) 


1.52 


(.13) 


-.66 (.03) 


(.03) 


11 


1.01 (.07) 


-1.13 (.09) 


(.06) 


.58 (.04) 


-1.60 (.30) 


(.09) 


.74 


(.04) 


-1.22 (.13) 


(.05) 


12 


1.11 (.08) 


-.34 (.04) 


(.02) 


1.89 (.15) 


-.34 (.02) 


(.01) 


.91 


(.05) 


-.19 (.05) 


(.01) 


13 


.84 (.04) 


.29 (.05) 


(-01) 


1.28 (.09) 


.04 (.03) 


(.00) 


.68 


(.04) 


.24 (.07) 


(-01) 


14 


.94 (.06) 


.70( .05) 


(-03) 


.88 (.05) 


-.25 (.04) 


(.01) 


1.05 


(.06) 


-.51 (.04) 


(.02) 
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TABLE 4 

Multi- Group DIF Statistics and Lord's Chi-Squares 





Multi- Group 


Lord's Chi-Square 






Pairwise Q 




Item 


DIF Q 


CI vs NC 


C2 vfl NC Cl 


vs C2 


Cl vs NC 


C2 vs NC 


Cl V8 C2 


1 


.15 


.04 


.04 


.15 


.03 


.03 


.16 


2 


1.76 


.35 


1.27 


1.16 


.58 


1.27 


1.61 


3 


1.77 


2.45 


1.06 


.13 


1.50 


1.06 


.07 


4 


.42 


.58 


.23 


.24 


.30 


.23 


.07 


5 


1.72 


.87 


.85 


.94 


.49 


.85 


1.20 


6 


.79 


.45 


.34 


.51 


.12 


.33 


.92 


7 


2.04 


2.46 


.66 


1.61 


.93 


.66 


1.76 


8 


3.78 


2.05 


.56 


1.58 


3.45 


.56 


1.42 


9 


6.76 


2.43 


1.61 


7.57 


2.57 


1.62 


6.09 


10 


11.46* 


5.74 


8.10 


.23 


8.51** 


8.10 


.68 


11 


1.84 


2.14 


.27 


.41 


1.39 


.27 


.85 


12 


4.42 


3.32 


.18 


4.61 


3.81 


.18 


3.19 


13 


2.98 


1.80 


.36 


2.49 


2.69 


.35 


1.59 


14 


20.20* 


5.94 


17.81** 


2.69 


8.70** 


17.80** 


3.55 



Significant at .05 alpha level and the crossponding critical value is = 9.49. 
^Significant at .017 alpha level and the crossponding critical value is xl = 8.16. 
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TABLE 5 

Linking Coefficients and DIE Items on Each Iteration 



Method Iflt Iteration 2nd Iteration 3rd Iteration 



(Linking) 


Coefficients 


DIF Item 


Coefficients 


DIF Item 


Coefficients DIF Item 


Multi-Group DIF 




14 




10, 14 


10, 14 


(CI onto NC) 


A = .957 
3 = .196 




A = .934 
B = .1U 




A= .896 
B = .040 


(C2 onto NC) 


A = .865 
B = .101 




A = .827 
B =-.016 




A= .788 
B =..080 


Lord's Chi-Square 












CI v8 NC 




None 








(CI onto NC) 


A = .957 
B = .196 










C2 vs NC 




14 




14 




(C2 onto NC) 


A = .865 
B = .101 




/ ^ .827 
B =-.016 






CI vs C2 




None 








(C2 onto CI) 


A = .899 
B =-.101 
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